Skip to content

Conversation

@tiferet
Copy link
Contributor

@tiferet tiferet commented Nov 18, 2022

This PR simplifies the query configurations. It moves common functionality from the derived classes up to AtmConfig. It also combines the two configs each query previously had into a single config, simplifying the code and making it easier to understand. This will prevent code duplication and bugs, improve maintainability, and make it easier to add new queries.

Once again commit-by-commit review is recommended 😄

Main changes

  • AtmConfig inherits from TaintTracking::Configuration. That way the specific query configs (e.g. DomBasedXssAtmConfig), which inherit from AtmConfig, also inherit from TaintTracking::Configuration. This removes the need for two separate config classes for each query (e.g. DomBasedXssAtmConfig and XssATM::Configuration). Note: We could also keep AtmConfig as is and have the query configs inherit both from it and from TaintTracking::Configuration directly, but I think having AtmConfig inherit from TaintTracking::Configuration makes more sense: it's a query configuration for a boosted taint tracking query, which is a type of taint tracking query that adds a few extra predicates (e.g. isKnownSink).
  • Move the definition of isSource to the base class: A long as we're not boosting sources, isSource is identical to isKnownSource.
  • Move the definition of isSink to the base class: Holds if sink is a known taint sink or an "effective" sink (a candidate to be classified by an ML model).
  • Remove code duplication in query .ql files: Define the query for finding ATM alerts in the base class AtmConfig, and call it from each query's .ql file.

Consistency check

This DCA experiment uses the model on main with the final commit of each PR from the CodeQL revamp. The idea is to double-check that none of the changes I've made has affected the endpoints that get scored at inference time, since this is not directly tested for in the PR checks. We see identical results from all variants (e.g. this), confirming that none of the revamp work has changed the endpoints being scored.

Timing checks

Addresses most of https://github.com/github/ml-ql-adaptive-threat-modeling/issues/2134.

@github-actions github-actions bot added the ATM label Nov 18, 2022
@tiferet tiferet marked this pull request as ready for review November 18, 2022 20:09
@tiferet tiferet requested review from a team and kaeluka and removed request for a team November 18, 2022 20:09
@tiferet tiferet force-pushed the tiferet/complexity-reduction branch from e6c76a5 to 9a37061 Compare November 19, 2022 00:08
@tiferet tiferet force-pushed the tiferet/simplify-configs branch from a543d95 to d8ef142 Compare November 19, 2022 00:10
@tiferet tiferet force-pushed the tiferet/complexity-reduction branch from 9a37061 to fac6641 Compare November 21, 2022 16:34
@tiferet tiferet force-pushed the tiferet/simplify-configs branch from d8ef142 to 42cc1bc Compare November 21, 2022 16:46
@owen-mc owen-mc changed the title Simplify query configurations ATM: Simplify query configurations Nov 22, 2022
@tiferet
Copy link
Contributor Author

tiferet commented Nov 22, 2022

@kaeluka when you review this PR, do you mind looking at the possible issue Henry raised here. I think it's OK, but I have limited understanding of the issue he's pointing out. Henry said From a very very quick scan, it looks OK, but I’d recommend taking a closer look. I'd appreciate your eyes on it as well to make sure there isn't a problem.

@kaeluka
Copy link

kaeluka commented Nov 24, 2022

Thank you for point this out! I was going to, but I definitely want to talk to Henry about it as well!

@kaeluka
Copy link

kaeluka commented Nov 28, 2022

As part of digging into the questions brought up by Henry, I added the following snippet to each file modified in this PR and quick-evaluated it:

predicate test(DataFlow::Configuration a) {
  any()
}

This showed me that there's actually a few files where there's several instances of DataFlow::Configuration in scope. I believe that, in each case, the situation is not preventing the PR from being merged.

These files are:

  • EndpointFeatures.ql - benign because the query only computes features, rather than evaluating flow.
  • FilteredTruePositives.ql - benign because the query only runs queries involving characteristics, rather than evaluating flow (and it's just a unit test).
  • ExtractEndpointMapping.ql - benign because the query does not evaluate flow
  • DebugResultInclusion.ql - this might actually be slowed down because it uses hasFlow but a) it's only a debug query, and b) the situation is like that already on main (the PR only changes some names in the file).

@henrymercer, is my reasoning sound?

@henrymercer
Copy link
Contributor

@henrymercer, is my reasoning sound?

Your reasoning around the files you listed looks good. Most of those files are test or debug files. I'm not sure how you handled the case of libraries being changed, as we'd need to check all the files that use this library and not just the ones being changed.

I did try evaluating the predicate top-down on each of the queries and they each reported a single dataflow configuration, so I think we're good to go.

@kaeluka
Copy link

kaeluka commented Nov 28, 2022

Perfect! I'll continue with the conventional part of the review :) This should be done ~tomorrow lunchtime, @tiferet. Only minor comments so far, but will take some time to think about the big picture tomorrow.

@tiferet tiferet force-pushed the tiferet/complexity-reduction branch from fac6641 to 4580b55 Compare November 28, 2022 19:36
@tiferet
Copy link
Contributor Author

tiferet commented Nov 28, 2022

@kaeluka Since you're already in the middle of reviewing this PR, I don't want to rebase it on top of the latest version of tiferet/complexity-reduction and interfere with your ongoing work. Please LMK what your preference is. Should I:

  • Rebase on top of tiferet/complexity-reduction now?
  • Wait until tiferet/complexity-reduction is merged to main and then rebase on top of main?
  • Wait until tiferet/complexity-reduction is merged to main and then merge main into this branch?
  • Something else?

kaeluka
kaeluka previously approved these changes Nov 29, 2022
Copy link

@kaeluka kaeluka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The big question, whether this might introduce a pathological dependency graph, here has been tackled in the PR discussion already.

👍: The rest of the review is that I:

  • Like the change and am not asking you to make any big changes.
  • I'd slightly prefer to follow convention in the naming of the Config classes but it's not a hill I'll die on.
  • I was a bit surprised to see the PR, as it seems like it's a bit of a side track from the grand design we're following. Happy to see it either way, though!

Base automatically changed from tiferet/complexity-reduction to main November 29, 2022 21:17
All remaining functionality in `CoreKnowledge` is only being used in `EndpointCharacteristics`, so it can be moved there as a small set of helper predicates.
All remaining functionality in `StandardEndpointFilters` is only being used in `EndpointCharacteristics`, so it can be moved there as a small set of helper predicates.
That way the specific configs which inherit from `AtmConfig` also inherit from `TaintTracking::Configuration`.

This removes the need for two separate config classes for each query.
A long as we're not boosting sources, `isSource` is identical to `isKnownSource`.
Holds if `sink` is a known taint sink or an "effective" sink.
Define the query for finding ATM alerts in the base class `AtmConfig`, and call it from each query's .ql file.
@tiferet tiferet force-pushed the tiferet/simplify-configs branch from 6813358 to 6f807e9 Compare November 29, 2022 21:22
Name the query configuration e.g. `NosqlInjectionATMConfig` rather than `Configuration`.
@tiferet tiferet requested a review from kaeluka November 29, 2022 23:49
@tiferet
Copy link
Contributor Author

tiferet commented Nov 29, 2022

I was a bit surprised to see the PR, as it seems like it's a bit of a side track from the grand design we're following. Happy to see it either way, though!

Yes, I think we're already done implementing the design. All the issues left here are improvements we can make either to the code or to the data now that the refactoring is complete 🎉

* to this ML-boosted configuration, whereas the unboosted base query does not contain this source and sink
* combination.
*/
predicate getAlerts(JS::DataFlow::PathNode source, JS::DataFlow::PathNode sink, float score) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor/bikeshed: I'd prefer a name that doesn't suggest a return/result value, and mentions flow. Also makes it easier to see the parallels between unboosted and boosted queries. How about hasLikelyFlowPath?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know about hasLikelyFlowPath, because one of the conditions it must meet is hasFlowPath. Logically hasFlowPath should imply hasLikelyFlowPath.

Maybe something like hasBoostedFlowPath? We've been trying to get rid of the "boosted" terminology, but I can't think of something better right now. I think hasAlert would be clearer (ultimately that's what this predicate does -- it surfaces all the alerts for this config), but that doesn't mention flow.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: I tracked this renaming together with the other renamings here, so that we can merge this PR and XssThroughDOM and open a separate PR for all the renamings

@tiferet tiferet merged commit e2e3667 into main Nov 30, 2022
@tiferet tiferet deleted the tiferet/simplify-configs branch November 30, 2022 01:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants